[2.3.0] performance enhancements#60
Draft
TheAngryRaven wants to merge 12 commits into
Draft
Conversation
…k task The SD access 'mutex' was a non-atomic check-then-set on a volatile int, called from both the main loop and the Bluefruit callback task, and several BLE commands ran SdFat directly in the callback: - LIST/TLIST iterated directories for seconds while only peeking the flag - GET:/TGET: started transfers in the callback, closing a file the main loop could be mid-read() on - DELETE: took no lock at all and could remove the file being streamed Any of these could interleave with 25 Hz session logging: FAT corruption, lost logs, SPI wedge — triggerable from radio range. Fixes: - acquireSDAccess()/releaseSDAccess() now commit state transitions inside a FreeRTOS critical section (taskENTER_CRITICAL — BASEPRI-masked, so SoftDevice radio interrupts are unaffected). - The grant/deny decision table is extracted into the host-tested sd_access_policy pure unit (single source of truth for the SD_ACCESS_* values; covered by tests/sd_access_policy_test.cpp, wired into the unit-test, coverage, and clang-tidy CI jobs). - LIST/GET:/DELETE:/TLIST/TGET: are deferred to BLUETOOTH_LOOP() via a one-deep command buffer, like the settings/track/OTA commands already were — SdFat is never touched from the callback task. A second command while one is queued gets the protocol's BUSY / TERR:BUSY reply. - Directory listings hold the SD lock for the whole walk; bleDeleteFile() takes the lock and refuses with BUSY while a transfer is streaming (the idempotent same-mode re-acquire would otherwise let it delete the streamed file and then drop the transfer's lock). - Queued commands are dropped on disconnect/BLE_STOP so a stale command can't execute on the next session. https://claude.ai/code/session_01Voi2VL5zjikyhEqgTvzmAL
…ion-mhs8yd Fix SD arbitration race (CR-1): atomic acquire + no SdFat in the BLE callback task
Coverage — host-testable units📂 Overall coverage
📄 File coverage
|
CR-4: Sleep mode was permanently defeated after the first engine pulse. The tach ISR set tachHavePeriod and nothing ever cleared it, so every sleep entry (long-press, 5-min menu idle, USB) read the stale flag and instantly bounced through exitSleepMode(true) back into race mode with logging enabled — silently starting a session and draining the pack overnight. New TACH_SLEEP() (called from enterSleepMode()) re-arms the wake trigger and drops stale ring-buffer/Kalman state so the wake's RPM computation starts clean; the ISR stays attached and the next real pulse still RPM-wakes straight into race mode as designed. H-1/H-2: Lap-time formatting was six divergent inline copies of the 60000/%1000 math with four padding behaviors. Three live pages (current/best/optimal lap) appended the millisecond zero-padding AFTER the value — 1:23.007 rendered as 1:23.700, a 693 ms error — and the lap-history list did no padding at all (1:5.7). All six sites now call one host-tested pure unit, lap_format::formatLapTime() (ms always three zero-padded digits), with three zero-minutes styles preserving each page's layout: kOmit (replay results, already-correct rendering unchanged), kShow (lap list, now a proper 0:SS.mmm fixed field), and kSpace (big-font live pages, column-stable decimal point). Covered by tests/lap_format_test.cpp including the exact H-1 regression case; wired into the unit-test, coverage, and clang-tidy CI jobs. https://claude.ai/code/session_01Voi2VL5zjikyhEqgTvzmAL
…ormat-mhs8yd Fix sleep-mode RPM-wake latch (CR-4) and lap-time rendering (H-1/H-2)
…aster/release pin v2.3.1 beta.yml installs the lap-timer library from its BETA branch so the two beta channels move together. release.yml pins the known-good v2.3.1 tag instead of floating on the library's default-branch tip. compile-sketch picks the ref dynamically from the build's target branch (pull_request base_ref or push/dispatch ref_name): BETA-targeted builds use the library's BETA branch, everything else uses v2.3.1 — one file content works unchanged on both branches, so BETA→master merges stay clean and nothing needs flipping at release time. https://claude.ai/code/session_01Voi2VL5zjikyhEqgTvzmAL
CI: DovesLapTimer per channel — BETA builds track library BETA, master/release pin v2.3.1
…0, not v2.3.1
v2.3.1 doesn't exist in the DovesLapTimer repo, so the library checkout
failed ('pathspec v2.3.1 did not match any file(s)') and the
compile-sketch jobs went down before compiling anything. Docs updated to
match.
https://claude.ai/code/session_01Voi2VL5zjikyhEqgTvzmAL
…llowup-mhs8yd CI: fix DovesLapTimer pin to v4.1.0 — #62 merged before the correction landed
…ery (H-3/H-4/H-5, partial H-8) H-3: trackDetectionLoop() gated only on gpsData.fix, which stays true between PVT updates, so the O(N) software-double haversine manifest scan (several ms at the 200-entry ceiling on the single-precision-FPU M4F) ran every ~250 Hz loop iteration, collapsing the loop rate. Throttled to 1 Hz — still instant at driving pace. H-4: the tach pulse ISR advanced the ring head unconditionally. An SD GC stall (documented 100 ms–2 s — the same failure the GPS serial ring exists for) at racing RPM overruns the 16-entry buffer; lapping the consumer breaks the SPSC invariant and TACH_LOOP computes a confident-but-wrong RPM that lands in the CSV and feeds the >500 RPM auto-race trigger. The ISR now does the SPSC full check (one slot sacrificed, matching the GPS serial ISR), drops the pulse, and sets tachRingOverflow; TACH_LOOP discards the carried prev-timestamp so the one period spanning the gap is never computed and the Kalman estimate coasts. tachRingTail becomes volatile (the ISR reads it now); the wake trigger still fires on dropped pulses. H-5: GPS_BAUD_RECOVERY() runs from GPS_LOOP() under the armed ~4 s WDT but can block for up to three ~1.1 s myGNSS.begin() probes plus ~500 ms of delays — a genuinely hung module out-waits the watchdog and the one path built to recover a sick GPS becomes a reset/recovery boot loop. It now pets the WDT before each blocking probe (same treatment fwStageToFlash() already had). H-8 (partial): the tach Kalman filter math (predict/update, Q/R/floor tuning constants, period→RPM conversion) is extracted to the host-tested tach_filter pure unit per the project convention, with tests covering convergence, monotonic approach, R-scaling with period count, the uncertainty floor, and reset semantics. The OTA protocol state machine and BLE command dispatcher extractions are deliberately deferred — they are large refactors that deserve their own PRs. https://claude.ai/code/session_01Voi2VL5zjikyhEqgTvzmAL
…r (sibling passed in 2.5 min) https://claude.ai/code/session_01Voi2VL5zjikyhEqgTvzmAL
…ening-mhs8yd Fix hot-path track scan, tach ring overflow, WDT-unsafe GPS recovery (H-3/H-4/H-5, partial H-8)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rollup of the BETA hardening work driven by the professional code review — fixing both criticals that weren't slipped, three HIGHs, plus CI supply-chain pinning. Five PRs merged so far: #59, #61, #62, #65, #66.
Security
SD card arbitration race — fixable from radio range (CR-1, #59).
acquireSDAccess()was a non-atomic check-then-set on a shared flag, and five BLE commands (LIST,GET:,DELETE:,TLIST,TGET:) ran SdFat directly in the Bluefruit callback task, concurrent with 25 Hz session logging — overlapping commands could close a file mid-read, delete the file being streamed (DELETE:took no lock at all), or corrupt FAT outright. Now: lock transitions are atomic (FreeRTOS critical section; grant/deny table lives in the host-testedsd_access_policyunit), every SD-touching BLE command is deferred toBLUETOOTH_LOOP()so SdFat is single-task, listings hold the lock for the whole walk, andDELETErefuses while a transfer streams. Protocol impact: overlapping commands get the existingBUSY/TERR:BUSYreplies instead of executing concurrently.Fixed
Sleep mode actually sleeps (CR-4, #61). The tach ISR latched
tachHavePeriodon the first engine pulse since boot and nothing cleared it, so every sleep entry instantly RPM-bounced back into race mode with logging enabled, draining the pack overnight.enterSleepMode()now callsTACH_SLEEP()to re-arm the wake trigger; a real engine pulse still wakes straight into race mode.Lap times rendered wrong on the live pages (H-1/H-2, #61). Three pages appended millisecond zero-padding after the value (
1:23.007displayed as1:23.700— a 693 ms error), and the lap list did no padding at all. All six divergent inline copies of the ms→M:SS.mmmmath were replaced by the host-testedlap_formatunit (three zero-minutes styles preserve each page's layout; replay's already-correct rendering unchanged).Track detection no longer throttles the main loop (H-3, #66). The O(N) software-double haversine manifest scan was gated on
gpsData.fix(which stays true between PVT updates) and ran every ~250 Hz loop iteration. Now throttled to 1 Hz.Tach ring can't be lapped during SD stalls (H-4, #66). The pulse ISR advanced head unconditionally; a documented 100 ms–2 s SD GC stall at racing RPM overran the 16-entry ring, breaking the SPSC invariant and logging confident-but-wrong RPM that also fed the >500 RPM auto-race trigger. The ISR now checks full, drops, and flags;
TACH_LOOP()discards the one period spanning the gap.GPS baud recovery can't trip the watchdog (H-5, #66).
GPS_BAUD_RECOVERY()(up to three ~1.1 s probes + ~500 ms delays) could out-wait the 4 s WDT against a hung module — reset → recovery → reset boot loop. It now pets the WDT before each blocking probe.Changed
CI pins DovesLapTimer per channel (#62 + #65). BETA-targeted builds track the library's
BETAbranch so the two beta channels move together; master CI and release/tag builds pinv4.1.0instead of floating on the library's default-branch tip.compile-sketchselects the ref dynamically from the build's target branch, so the same file content is correct on both branches and nothing needs flipping at release time. (Master got the identical workflow commit directly via #63.)Test infrastructure
Three new host-tested pure units per the project convention —
sd_access_policy(arbitration decision table),lap_format(lap-time rendering incl. the exact H-1 regression case), andtach_filter(Kalman predict/update math, partial H-8) — all wired into the unit-test, coverage, and clang-tidy CI jobs. Host suite is now 118 test cases / 1595 assertions.Known-remaining (deliberately not in this release)
BirdsEye.inogod-file decomposition and layering — needs module-boundary decisions first.Full details per change in
CHANGELOG.md[Unreleased], which rolls into2.3.0with this merge.